首页> 外文OA文献 >Encoding and communicating navigable speech soundfields
【2h】

Encoding and communicating navigable speech soundfields

机译:编码和传达可导航的语音声场

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This paper describes a system for encoding and communicating navigable speech soundfields for applications such as immersive audio/visual conferencing, audio surveillance of large spaces and free viewpoint television. The system relies on recording speech soundfields using compact co-incident microphone arrays that are then processed to identify sources and their spatial location using the well-known assumption that speech signals are sparse in the time-frequency domain. A low-delay Direction of Arrival (DOA)-based frequency domain sound source separation approach is proposed that requires only 250 ms of speech signal. Joint compression is achieved through a previously proposed perceptual analysis-by-synthesis spatial audio coding scheme that encodes sources into a mixture signal that can be compressed by a standard speech codec at 32 kbps. By also transmitting side information representing the original spatial location of each source, the received mixtures can be decoded and then flexibly reproduced using loudspeakers at a chosen listening point within a synthesised speech scene. The system was implemented based on this framework for an example application encoding a three-talker navigable speech scene at a total bit rate of 48 kbps. Subjective listening tests were conducted to evaluate the quality of the reproduced speech scenes at a new listening point as compared to a true recording at that point. Results demonstrate the approach successfully encodes multiple spatial speech scenes at low bit rates whilst maintaining perceptual quality in both anechoic and reverberant environments.
机译:本文介绍了一种用于可导航语音声场的编码和通信的系统,用于诸如沉浸式音频/视频会议,大空间的音频监视和自由视点电视等应用。该系统依赖于使用紧凑的同声麦克风阵列记录语音声场,然后使用众所周知的假设(即语音信号在时频域中稀疏)进行处理,以识别声源及其空间位置。提出了一种基于低延迟到达方向(DOA)的频域声源分离方法,该方法仅需要250 ms的语音信号。联合压缩是通过以前提出的综合感知感知的空间音频编码方案实现的,该方案将源编码为混合信号,并可以通过标准语音编解码器以32 kbps的速率对其进行压缩。通过还发送代表每个信号源原始空间位置的辅助信息,可以对接收到的混合信号进行解码,然后在合成语音场景中的选定收听点使用扬声器灵活地进行再现。该系统是基于此框架实现的,用于示例应用程序,以48 kbps的总比特率编码三方通话者可导航语音场景。进行了主观听觉测试,以评估在新聆听点与真实记录相比的再现语音场景的质量。结果表明,该方法以低比特率成功编码了多个空间语音场景,同时在无回声和混响环境中都保持了感知质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号